noncebalancer: use endpointsharding, ignore ready status#8679
Merged
Conversation
The old noncebalancer only saw READY SubConns, which was a problem during the brief periods when a SubConn needed to reconnect (for instance due to a GOAWAY from the server). Unfortunately, that's all the balancer interface provides. And we can't get it to pass non-READY SubConns to our picker without reimplementing or copying all its SubConn management logic. Luckily, grpc provides the [`endpointsharding`] balancer implementation that does exactly what we want. It maintains a collection of child balancers each owning a single endpoint (note: for our purposes an endpoint is equivalent to addresses, though it can be one-to-many). It also lets us query the [state] of each child, including the endpoint it's responsible for us. This allows us to construct a picker that is aware of all available backends, even those that aren't currently READY. That, in turn, prevents us from temporarily serving errors while a given nonce redemption backend reconnects. To see an example of `endpointsharding` in use, see the [`customroundrobin`] implementation. For more context on how `endpointsharding` came to be implemented, see [gRFC A61: IPv4 and IPv6 Dualstack Backend Support](a61). [`endpointsharding`]: https://pkg.go.dev/google.golang.org/grpc/balancer/endpointsharding [state]: https://pkg.go.dev/google.golang.org/grpc/balancer/endpointsharding#ChildState [a61]: https://github.com/grpc/proposal/blob/master/A61-IPv4-IPv6-dualstack-backends.md [`customroundrobin`]: https://github.com/grpc/grpc-go/blob/99f36d4a0c28bc967a8d3fe23ebc2a264b322070/examples/features/customloadbalancer/client/customroundrobin/customroundrobin.go
Contributor
Author
|
Back in draft because I'm currently implementing the config-based switching between implementations. |
Set maxConnectionAge to 1s, and make nonce_test.go collect 300 nonces, then redeem them one at a time, separated by 10ms. This creates a high likelihood of a redemption request occuring during a reconnect.
Contributor
Author
|
Ready for review. The new noncebalancer is selectable by setting in wfe2.json: |
beautifulentropy
previously approved these changes
Mar 18, 2026
Member
beautifulentropy
left a comment
There was a problem hiding this comment.
Great work on this! Using endpointsharding is a really clean way to get visibility into non-READY backends without reimplementing SubConn management. I have just one optional comment, let me know what you think.
Contributor
|
@jsha, this PR appears to contain configuration and/or SQL schema changes. Please ensure that a corresponding deployment ticket has been filed with the new values. |
aarongable
previously approved these changes
Mar 19, 2026
beautifulentropy
approved these changes
Mar 19, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The old noncebalancer only saw READY SubConns, which was a problem during the brief periods when a SubConn was reconnecting (for instance due to a GOAWAY from the server), since nonce redemption requests are not fungible between backends. Unfortunately, READY SubConns are all that the balancer interface provides. And we can't get that interface to pass non-READY SubConns to our picker without reimplementing or copying all its SubConn management logic.
Luckily, grpc provides the
endpointshardingbalancer implementation that does exactly what we want. It maintains a collection of child balancers each owning a single endpoint (note: for our setup an endpoint is equivalent to a single address, though it can be one-to-many). It also lets us query the state of each child, including the endpoint it's responsible for.This allows us to construct a picker that is aware of all available backends, even those that aren't currently READY. That, in turn, prevents us from temporarily serving errors while a given nonce redemption backend is reconnecting.
To see another example of
endpointshardingin use, see thecustomroundrobinimplementation.For more context on how
endpointshardingcame to be implemented, see gRFC A61: IPv4 and IPv6 Dualstack Backend Support.If you're curious how
endpointshardingpasses around the information about non-READY SubConns, it uses a type assertion from abalancer.Pickerto its internal type.Alternative to #8672. Fixes #8662.
This edits
noncebalancer.goin place for ease of diffing, and also copies the originalgrpc/noncebalancer(with no edits) togrpc/noncebalancerv1. But don't take my word for it: